Protein Tau’s Role in Gene Expression

Group 25: Ana Moral (s232119), Jacqueline Printz (s194377), Jenni Kinnunen (s204697), João Prazeres (s243036), William Gunns (s242051)

1. Introduction - Protein Tau

  • Function: Microtubule protein essential for cytoskeletal stability and neuronal transport.

  • Supports healthy neuronal functions.

  • Destabilization linked to neuronal dysfunction, and Alzheimer’s Disease.

  • Previous studies concluded that Tau destabilization led to an alteration in the expression of glutamatergic genes.

Experimental Objective:

Is the overexpression of Tau associated to gene expression alterations?

Tau Protein Diagram

2. Experimental Setup

Differential gene expression analysis of RNA-seq data performed on:

  • Control: 3 samples of SH-SY5Y cells with overexpression of a control vector.
  • Experimental Condition: 3 samples of SH-SY5Y cells with overexpression of Tau 0N4R isoform.

RNA-seq data was reported on 3 xls sheets:

  • Read Counts.
  • RPM (Reads Per Million).
  • RPKM (Reads Per Kilobase Million).

The 3 sheets were joined into one large tibble data frame.

# A tibble: 58,395 × 9
   ...1     GeneName description SH_ctrl_1 SH_ctrl_2 SH_ctrl_3 SH_tau_1 SH_tau_2
   <chr>    <chr>    <chr>           <dbl>     <dbl>     <dbl>    <dbl>    <dbl>
 1 ENSG000… TSPAN6   tetraspani…       319       582       280      214      189
 2 ENSG000… TNMD     tenomoduli…         0         0         0        0        0
 3 ENSG000… DPM1     dolichyl-p…       792      1556       781      521      502
 4 ENSG000… SCYL3    SCY1 like …       517       561       445      323      365
 5 ENSG000… C1orf112 chromosome…       533       537       566      601      584
 6 ENSG000… FGR      FGR proto-…         0         0         0        2        2
 7 ENSG000… CFH      complement…         2         0         1        0        0
 8 ENSG000… FUCA2    alpha-L-fu…       487       761       447      341      321
 9 ENSG000… GCLC     glutamate-…       430       703       246      233      218
10 ENSG000… NFYA     nuclear tr…      1101      1156       760      898      583
# ℹ 58,385 more rows
# ℹ 1 more variable: SH_tau_3 <dbl>

3. Data Wrangling

First the data was prepared and made clean by:
1. by joining all dataframes into one
2. renaming columns
3. removing unecessary and invalid observations (including descriptions and rows of data that are all zero)

Some of the key functions that were used include full_join, mutate, rename, select) Following this, each row corresponds to an observation, each column corresponds to a variable and each cell is a value. Could insert picture of ‘clean’ data - lecture style.

<<<<<<< HEAD
=======
>>>>>>> e3fd0a91d520275eaedafef24b9e9de4b65d8b83

The data was then log transformed to enable a deeper analysis of observations with small margins of difference. Average across trials of experiment (and variance) THINK THIS BIT IS FOR AUGMENT

4. Data Augment

Statistics data frame

  • Variance for the 3 replicates for each attributes

Filtered data

  • Genes with mean variance > 1 were filtered out

  • For the rest, average of replicates was kept

    • Data frame 3 times smaller

<<<<<<< HEAD

5. Data Description part 1

=======

5. Data Description part 1

All data

  • 522,648 observations, 3 attributes

  • 29,036 genes

  • 18 experiments, each of them have 3 replicates

Filtered data

>>>>>>> e3fd0a91d520275eaedafef24b9e9de4b65d8b83
  • All data: all genes, no replicates
  • Filtered data: significant genes, no replicates (for overview of the data)
  • Log data: significant genes, all 3 replicates (for PCA)

6. Data Description part 2

Insert picture of plot Similar description to that in the report

7. Analysis PCA

The objective was to confirm that RPM, RPKM and reads give the same results, as well as confirm that there is in fact a difference when comparing the control and the tau experiment.

We performed 3 PCAs (the first 3 plots) on each type of results, and 1 final PCA for all.

<<<<<<< HEAD

8. Analysis PCA

=======

8. Analysis PCA

Maybe for plots

>>>>>>> e3fd0a91d520275eaedafef24b9e9de4b65d8b83

9. Gene Set Enrichment Analysis

Text

10. Discussion based on the GSEA/conclusion

Which genes were overexpressed? Does it make sense with the literature?

Challenges

  • [Challenge 1]
  • [Challenge 2]

Limitations

  • [Limitation 1]
  • [Limitation 2]